← Back to Model Comparison
gpt-oss-20b
25.5%
Overall Accuracy
Answer Key:
claude-opus-4-5-20251101
Boundary Models:
19
Pairs:
171
Total Rollouts:
855
Max Turns:
5
Pairwise Accuracy Matrix
Conversation Explorer
Boundary Model A
claude-haiku
claude-sonnet
claude-opus
llama-3.2-1b
llama-3.2-3b
gemma-3-4b-it
gemma-3-27b-it
qwen3-1.7b
qwen3-4b
qwen3-8b
qwen3-14b
qwen3-4b-instruct
qwen3-30b-instruct
olmo3-7b
olmo3-32b
mistral-3-3b
mistral-3-8b
mistral-3-14b
gpt-oss-20b
Boundary Model B
claude-haiku
claude-sonnet
claude-opus
llama-3.2-1b
llama-3.2-3b
gemma-3-4b-it
gemma-3-27b-it
qwen3-1.7b
qwen3-4b
qwen3-8b
qwen3-14b
qwen3-4b-instruct
qwen3-30b-instruct
olmo3-7b
olmo3-32b
mistral-3-3b
mistral-3-8b
mistral-3-14b
gpt-oss-20b
Pair Accuracy:
--
← Previous
Next →
Conversation
1
of
0
💬
Select a model pair to view conversations